Learning for transliteration of arabic-numeral expressions using decision tree for Korean TTS

نویسندگان

  • Youngim Jung
  • Donghun Lee
  • HyeonSook Nam
  • Ae-sun Yoon
  • Hyuk-Chul Kwon
چکیده

Despite of much work on TTS technologies and several TTS systems customized for Korean, current TTS systems output many errors in transliterating non-alphabetic symbols such as Arabic numerals and text symbols. This paper proposes TLAN (Transliteration Learner for Arabic-Numeral expressions) which can efficiently disambiguate the reading and meaning of Arabic Numeral Expressions (ANEs) in texts by using a decision tree. For the purpose of analyzing and learning data, three phases of learning elements were suggested: patterns of Arabic numerals combined with text symbols, contextual features and heuristic information were classified according to the senses and sounds of ANEs. Our corpus was made up of news articles issued from January 1, 2000 to December 31, 2001 from 10 major newspapers in Korea. By learning the three phases of learning elements, the model shows 97.38% and 97.28% accuracies for the training set and the test set, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Transliteration and Back-transliteration by Decision Tree Learning

Automatic transliteration and back-transliteration across languages with drastically different alphabets and phonemes inventories such as English/Korean, English/Japanese, English/Arabic, English/Chinese, etc, have practical importance in machine translation, crosslingual information retrieval, and automatic bilingual dictionary compilation, etc. In this paper, a bi-directional and to some exte...

متن کامل

Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS

Transliteration of Arabic numerals is not easily resolved. Arabic numerals occur frequently in scientific and informative texts and deliver significant meanings. Since readings of Arabic numerals depend largely on their context, generating accurate pronunciation of Arabic numerals is one of the critical criteria in evaluating TTS systems. In this paper, (1) contextual, pattern, and arithmetic f...

متن کامل

Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

This study describes the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems. We collected 400 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. The phonemic and prosodic boundaries were manually marked on the recorded speech, and morphological analysis, grapheme-to...

متن کامل

Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system

Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally applied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of zscore based phonetic normalization. The idea of ours comes from the fact that...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004